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1.0 Introduction 


The need for intelligent robots that are able to assist with operations in space will continue to 
increase as the human presence in extraterrestrial environments expands . 1 ' 6 There are several 
reasons why the development of such robotic devices, operating with varying degrees of 
autonomy, will be a critical step toward achieving this goal. Foremost among these reasons is that 
extended operations in space by humans require complex life support systems and shielding from 
hazards such as radiation. This means that the time which an astronaut may devote to tasks outside 
an orbiting vehicle or space station is an exceptionally valuable resource that should be allocated to 

tasks requiring a high degree of human intelligence. 

Many of the tasks that will be required to achieve a particular goal will not demand such high 
levels of intelligence, however. For example, a crew member that is servicing a satellite or space 
station might need a particular tool or replacement unit to be fetched. This is a task that would be 
appropriately delegated to a spatially mobile robot that has the ability to recognize objects, estimate 
their spatial poses, grasp, and retrieve them. These are precisely the types of objectives that the 
Extravehicular Activity Helper/Retriever (EVAHR) is envisioned to achieve. 

The EVAFIR is a robotic device currently being developed by the Automation and Robotics 
Division at the NASA Johnson Space Center to support activities in the neighborhood of Space 
Station Freedom. Its primary responsibilities will be to retrieve tools, equipment, or other objects 
which may become detached from the spacecraft, or to rescue a crew member who may have been 
inadvertently de-tethered. Later goals will include cooperative operations between a crew member 
and the EVAHR, such as holding a light to illuminate a work area, exchanging an Orbital 
Replacement Unit (ORU). or maintaining equipment. 

In order to be able to perform such tasks, it is clear that the EVAHR must be able to reason 
about its operational environment based on the input obtained from one or more sensors. This 
input is generally extracted from sensors that are capable of providing intensity and/or range 
information, and there are advantages and drawbacks for each of these sensory domains depending 
on the processing goal or the types of objects about which reasoning is to be performed. For 
example, a laser scanner is directly able to extract three-dimensional coordinates from an observed 
object whereas considerable computationally complex processing is necessary if only an intensity 
based imaging system is employed using a classical method such as shape-from-shading. These 
three-dimensional coordinates can be used to recognize the object based on its geometry or to 
estimate its spatial pose (location and orientation). On the other hand, certain objects, such as 
those covered with highly reflective material do not provide good return signals for the laser 
scanner, thus minimizing its usefulness in such cases. However, there are certain intensity based 
algorithms that make the estimation of spatial pose a straightforward and computationally 
inexpensive process if the right geometry exists among four or more extracted features in the 

intensity image. 
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The examples cited in the preceding paragraph are illustrative of the need for a Vision System 
Planner (VSP) that is capable of selecting a sensor based on knowledge of sensor capabilities and 
object characteristics. The justifications for a VSP extend far beyond sensor selection, however, 
since once a sensor has been selected it may need to be reoriented to obtain a better view of a target 
object. In some cases, physical characteristics of the sensor such as scanning rate or effective 
resolution may need to be altered so that the data can be acquired more rapidly or such that feature 
location estimates can be improved. Once a sensor has been selected and configured for the task at 
hand, the VSP should also be capable of selecting an appropriate algorithm to achieve the current 
vision system goal based on what is known about the sensor configuration, the characteristics of 
the objects being reasoned about, and the state of the operational environment as represented in a 
world model. Figures 1 and 2 show the fundamental functional components of the VSP and its 
relationship to the higher level Task Planner. 

The remainder of this report is div ided into four sections that describe research progress toward 
the development of such a Vision System Planner and make recommendations for future related 
research. Section 2 reviews the initial study of the vision system architecture. The details of the 
initial VSP design are documented in a paper entitled "A Vision System Planner for the 

Extravehicular Activity Retriever’^ which was published in the Proceedings of the International 
Conference on Intelligent Autonomous Systems , and thus section 2 may be skipped by the reader 
who has read that paper or who is familiar with the research performed during the summer ot 
1992. Section 3 details the implementation phase of the follow-on research in which many ot the 
approaches developed in the initial study were realized on available intensity image processing 
hardware mounted on a mobile robot platform. Section 4 extends the study of the vision system 
architecture beyond the limitations of the available sensory and robotic hardware by incorporating 
synthetically generated range images and demonstrates how a moderate amount of range data 
processing can facilitate the recognition process. Section 5 discusses how the vision system can 
plan sequences of actions relating to object recognition and object pose estimation using 
complementary sensors and a variety of algorithmic options to accomplish a current visual 
objective and makes recommendations for continuing research relating to the Vision System 
Planner. 




Figure 1: Planning System Architecture 












2. Vision System Planner Design Considerations 


2 1 Racki»roiind for the Initial D esign of VSP 

The planning mechanisms developed for the initial VSP were founded on the assumption that 
there should be at least two visual sensors which provide intensity and range images. There are 
several reasons why such a multisensory approach is desirable, three of which are particularly 
significant. First, the availability of sensors with complementary capabilities permits the VSP to 
select a sensor/algorithm combination that is most appropriate for achieving the current visual goal 
as specified by the task planner. Second, if the sensor that the VSP would normally select as its 
first choice to achieve the goal is either unavailable or inappropriate for usage because of some 
current constraint, it may be possible to perform the desired task using the other sensor to achieve 
the same goal, albeit perhaps by accepting a penalty in performance. Finally, instances may occur 
for which it is desirable to verify results from two different sensory sources rather than relying on 
the inferences based on data obtained from a single sensor. 

The first of the above motivations addresses the need to achieve the visual goal in the most 
effective manner by allowing the VSP to choose among sensors with complementary capabilities. 
For example, if it is desired to distinguish between two objects of similar structure with the color 
of the objects being the primary differentiating feature, then it is apparent that the color camera 
should be used as the primary sensor. On the other hand, if the size and/or geometry of the objects 
are most useful for determining identity, then it is important to be able to expeditiously extract and 
process three-dimensional coordinates. Clearly, this is a task that would be most properly assigned 
to the laser scanner. Similarly, tasks involving pose estimation 8 , object tracking 9 and motion 
estimation 10 would more appropriately involve invoking the laser scanner as the primary sensor. 
The initial versions of these submodules are under development and are to be tested in a reduced 

gravity environment using NASA's KC-135 aircraft 1 '. 

The previous example involving the need for three-dimensional coordinates is illustrative of a 
case in which the primary sensor (the laser scanner) is engaged to extract the required information. 
However, there may be cases for which the laser scanner cannot be used to obtain range 
information because (a) the object to be processed is covered with a highly specularly reflective 
material thus preventing acquisition of good return signals, (b) the laser scanner is currently 
assigned to another task, or (c) the laser scanner is temporarily not functioning properly. For such 
instances, it is highly desirable to provide a redundant capability by using the other sensor if 
possible. The classical method for determining three-dimensional coordinates from intensity 
imaoes involves a dual (stereo vision) camera setup in which feature correspondences are 
established and the stereo equations are solved for each pair of feature points. Although the 
assumed configuration has only one intensity image camera, this alternative mechanism for 
computing range values is in fact possible for the VSP to achieve by requesting the task planner to 


reposition the EVA HR such that the camera’s initial and final positions are offset by a known 
baseline distance. Of course, there is a penalty in performance if this (pseudo) stereo vision 
method ,s chosen, since the EVA HR must be moved and feature correspondences computed. 

owever, it is nevertheless important to have such a redundant sensing capability for the reasons 
previously mentioned and to be able to independently verify the results obtained from one sensor 
or to increase the confidence of those results. 

Aside from selecting an appropriate sensor, it is may also be possible to alter certain physical 
characteristics of the sensor such as the effective resolution and scanning rate. In the case of the 
aser scanner, images can be acquired at rates land resolutions) varying between 2.5 frames per 
second (256 x 2=6 pixels) to 10 frames per second (64 x 256 pixels). The capability to select a 
faster frame rate with a penalty in resolution becomes significant if i, is important to be able to 
sense and process data rapidly, as in the case of motion parameter estimation. On the other hand if 
an object ,s relatively stationary and finer features are to be sensed, then higher resolution with a 
ower rame rate would be chosen. Hence, a vision system planner should be able to select a 
sensor as well as its relevant parameters (e.g. scanning rate, resolution, zoom factor, orientation) 
Once an appropriate sensor has been selected and configured, the next step is to focus attention 
n he object(s) and to apply a preprocessing algorithm that will effectively achieve the current 
goal. Focusing attention ,s important because it reduces the amount of image data that must be 
processed for the immediate task. If the task is tracking an image blob that corresponds to an 

.°hen C h TT “t ‘ h<! 1,1,286 b '° b ^ Wi ' h b '° b » «0 occlusion 

objects predicted location (computed by the adaptive image blob tracker) is central to 

assisting in the segmentation of sub-blobs. 9 The selection of a pose estimation algorithm is 
directly dependent on the model being processed.* There are two fundamental classes of 

‘IT a T? e " ,ly emp '° yed ’ " amely - image-based (multi-view) pose 

estimation. If an object contains curved surfaces (e.g. a cylinder) then an image-based approach is 

taken by which the occluding contours derived from several views of the object that were recorded 

on a tessellated sphere are used as the basis for matching the observed object’s outline If the 

object has a polyhedral structure (no curved surfaces) then an object-based pose estimation 

a gonthm is employed, by which features extracted from images are matched against model 

during " a F ° r SitUali0 " S in Which ,he ob -i ect is close to the sensor (e.g. 

g gaping , the pose may be estimated on subparts of the entire object rather than the entire 

algorthm cl' ' , P T SeS .° f ^cognition, the subset of object features selected and the 

algorithm chosen are also a function of the size of objects in images. 

Proximity to target objects will affect not only the features selected for recognition and pose 
es ima ion ut will strongly influence the confidences associated with the results computed. For 
examp e, a typical scenario might involve a case in which the EVA HR is close enough to a target 

J eC ° yp0t esize Its class based on color, but too far away to definitively reco<mize its 
geometric structure using laser scanner data. In this case, the VSP would tentatively identify the 



object (using color) and would advise the task planner to move closer to the object so that a laser 
scanner image with higher resolution can be obtained. The confidence of the initial hypothesis 
would then be strengthened (or perhaps weakened) depending on the conclusion reached by 
processing the range data at close proximity. This capability is illustrative of the necessity for the 
VSP to be able to plan high level vision tasks as well as to be able to interact (interface) with the 
higher level task planner in order to reposition the EVAHR. Hence, at the highest level of vision 
system planning, the VSP will be responsible for task scheduling and resource planning. 

The fundamental architecture for the Vision System includes modules which are designed to 
detect, recognize, track, and estimate the pose of objects. Upon receiving a request from the main 
task planner to achieve one of these objectives, the Vision System Planner determines an 
appropriate sequence of goals and subgoals that, when executed, will accomplish the objective. 
The plan generated by the VSP will generally involve (a) choosing an appropriate sensor, (b) 
selecting an efficient and effective algorithm to process the image data, (c) communicating the 
nominal (expected) results to the task planner or informing the task planner of anomalous 
(unexpected) conditions or results, and (d) advising the task planner of actions that would assist 
the vision system in achieving its objectives. The specific plan generated by the VSP will primarily 
depend on knowledge relating to the sensor models (e.g. effective range of operation, image 
acquisition rate), the object models (e.g. size, reflectivity, color), and the world model (e.g. 
expected distance to and attitude of objects). The next subsection presents the resulting plans 
generated by the VSP for several different scenarios. 

2.2 Scenarios Illustrating the Operation of the VSP 

The operation of the prototype VSP that was initially designed and implemented can best be 
understood by examining the plans generated for various scenarios. For purposes of illustration, 
the initial state of the world is always assumed to be that there are three objects somewhere in front 
of the EVAHR. One of the objects is an Orbital Replacement Unit (ORU) with a known uniform 
color. For cases in which the EVAHR VSP needs to search for the ORU, the hemisphere in front 
of the EVAHR is searched in the spiraling manner shown in Figure 3. The task planner (perhaps in 
consultation with the human operator) selects an angular field of view (i.e. zoom factor) for the 
color camera which affects (in an inversely proportional manner) the number of hemispherical 
sectors that must be searched (i.e. the smaller the angular field of view, the larger the number of 
hemispherical sectors). For example, if the angular field of view is chosen to be 45°, sectors near 
the center of the forward hemisphere (sectors 1-6 in Figure 3) are searched and if the ORU is not 
found, the extreme sectors (7-14) are searched in that order. 
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Figure 3: hemispherical sector search order 

The scenarios that follow illustrate situations involving object detection, recognition, range 
estimation, and obstacle notification. 


Scenario 2.2.1 

Command received by the VSP: Search in front of the EVAHR for an ORU. 

Plan generated by the VSP: 

1 . Search the hemisphere in front of the EVAHR by activating the color camera fixing the 
effective focal length and spiraling outward from the center until the object is found 

(Figures 4a, 4b). 

2 If the ORU is found, terminate the spiraling search and iteratively refine the estimate of 
where the object is located by adjusting the sensor gimbals toward the object and reduce 
the field of view (telephoto zoom) until the object is centered and large in the image 
(Figures 4c 4d) If the ORU was not found, the VSP reports failure, after which there are 
several actions that could be taken. First, the forward hemisphere could be rescanned at 
higher magnification (a slower process since more scans will be required). Second, e 
forward hemisphere could be rescanned with increased illumination (requiring a decision to 
be made regarding the desirability in terms of overall objectives and power consumption by 
the illumination lource). Finally, the VSP could request the Task Planner to rotate the 
EVAFIR by 180 degrees and scan the rear hemisphere. 



T eleoperator command: RGB search for ORU 


Field of view angle = 50° 
Scan angle = 45° 

Figure 4a: search of sector 1 for ORU 



VSP response: ORU was found in sector 2 

Area of object was 150 pixels 

Figure 4b: search of sector 2 for ORU 



VSP action: Reorienting camera gimbals 

Setting field of view to 7° 

Figure 4c: first gimbal and zoom refinement 



VSP action: Reorienting camera gimbals 

Setting field of view to 4° 

Figure 4d: second gimbal and zoom refinement 






Teleoperator command: estimate range to ORU using laser scanner 

VSP response: estiamted range to ORU is 18.5 feet 

Figure 5: laser scanner range estimation 



Teleoperator command: estimate range to ORU using pseudo-stereo 

VSP response: estimated range to ORU is 18.5 feet 

Figure 6: pseudo-range estimation 



Teleoperator input: move EVAR along optical axis 


Figure 7: moving EVAR toward the ORU 



Teleoperator input: check for obstacles in field of view 

VSP action: obstacle located ( identified by cursor ) 

Figure 8: checking for obstacles prior to moving EVAR 


Scenario 2.2.2 

Command received bv the VSP: Determine the distance to the ORU, no sensor specified. 


Plan generated by the VSP: 

1 Locate the ORU as in Scenario 2.2.1 using the color camera. 

2. Examine the object model for an ORU and determine which sensor is the most appropriate 
to be used. In this case, since an ORU is not specularly reflective the laser scanner is 

3 . Examine that part of the laser scanner image that corresponds to the region belonging to the 
ORU in the color image and compute the distance to those range image elements (Figure 5). 


Scenario 2.2.3 

Command received bv the VSP: . 

Determine the distance to the ORU. but force the estimation of distance using single camera 

lateral stereo vision. 

Plan generated bv the VSP: 

1 . Locate the ORU as in Scenario 2.2. 1 using the color camera. 

2. Move the EVA HR left a known distance, take an image, and record the location of the ORU 
in that image. Then move the EVAHR right a known distance, take an image, and record 
the location of the ORU in that image. 

3. Using triangulation (stereo vision with two cameras separated by a known baseline 
distance) compute the distance to the ORU (Figure 6). 


Scenario 2.2.4 

Command received bv the VSP: 

Determine the distance to the ORU and move toward the ORU along the optical axis 
of the color camera until the EVAHR is a specified distance (D) away from it. 

Plan generated bv the VSP: 

1 . Locate the ORU as in Scenario 2.2. 1 using the color camera. 

2. Estimate the distance to the ORU (D oru ) using the laser scanner. 

3. Compute a vector along the optical axis of the color camera whose length is (D oru - D). 
Transform that vector into EVAHR coordinates and move to that position, maintaining the 
same attitude (Figure 7). 
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Scenario 2.2.5 

Command received by the VSP- 

MVother^^ecns^in'the^^d'of^view^^c^oser^the t^^H^Uimi^th'e'oRlI W ^ et ^ er 

moving toward it. ° tne than the ORU prior to 

Plan generated bv the VSP- 

1 . Locate the ORU as in Scenario 2.2. 1 using the color camera 

2. gtimate the distance to the ORU using the laser scanner. 

report a potemfal ScieTfan^ °f the region containing the ORU and 

- lim j r R^i^r^,;:S“sr es between the evahr ™ d 

2.3 Deficiencies of the Initial VSP 

ntensity domain and implementation on an actual mobile robot platform. 
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3.0 Implementation and Studies of the Initial VSP on a Mobile Robot Platform 


3. 1 Hardware Implementation of the Initial VSP 

The prototype VSP was completely developed using simulated sensors and hence the results 
obtained were under the most ideal of processing conditions. This meant that many “real world” 
issues did not arise such as those having to do with finding a specific object in cluttered visual 
backgrounds or dealing with sensor problems such as a camera that is out of focus. These 
problems are in fact significant issues that must be dealt with by the VSP, considering that there 
will be times when the earth with its oceans, clouds and continental land masses of varying colors 
will be the visual background for an object. Furthermore, the object, depending on its distance to 
the camera, may very well be out of focus. 

With the goals of testing, validating, and expanding the functionalities of the prototype Vision 
System Planner in a non-simulated environment, a Mobile Robot Platform (MRP) was developed. 
The hardware available for this testbed MRP consisted of the following components. 

1 . A TRC LabMate mobile robot with three degrees of freedom provided general mobility for the 
other components. The LabMate can move about the floor (x and y translation) or rotate about 
a vertical (z) axis (Figure 9a). 

2. A rotary carousel was attached to the top of the LabMate. This provided pan/tilt capabilities for 
the camera that was mounted on it (Figure 9b). 

3. A color camera with separate red, green, and blue (RGB) output signals which was mounted 
atop the rotary carousel provided the primary sensing capability. 

4. An Image Technology image processing system was used as the primary hardware unit for 
digitizing, displaying and processing multiband images. 

5. A Silicon Graphics GTX 210 workstation hosted and controlled all of the above devices. 

It should be noted that relative to the capabilities of the prototype VSP, the hardware available 
for the testbed MRP provided only color sensing capabilities. Hence, there was no facility for 
directly sensing range images via, for example, a laser scanner. Thus, the issues relating to range 
sensing were studied separately using simulated images with the results documented in section 4. 

3.2 Complexities Introduced in Scenarios Involving Actual Color Images 

The basic outline for planning to achieve goals relative to the scenarios discussed in section 2 
was followed using the MRP with the exception that no range sensor was available. As has been 
pointed out previously, this could represent an actual situation in which a normally available range 
sensor has become nonfunctional or has been temporarily allocated for another purpose. In any 
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Figure 9b: The Rotaiy Carousel with RGB Camera 
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event, for the scenarios that follow all plans are based only on information extracted from RGB 
images. 


With this restriction in mind, this means that object recognition and/or pose estimation must be 
achievable using only intensity information. The lack of direct range sensing capability for 
achieving such goals should not, however, be necessarily viewed as unduly restricting the 
capabilities of the VSP for the following reasons. First, if there are significant colored features on 
an object, they may in fact expedite recognition in preference to complex processing of dense range 
images. Second, if these identifiable features share a special geometry, then estimating pose based 
on the intensity image may be more straightforward than if the range image is used as the basis for 

pose estimation. The utility of using intensity domain features can be particularly well illustrated 
by examining the truss coupler shown in Figures lOa.b and I la,b. 

Figures 10a and 10b show the upper conic structure of a truss coupler that has been marked 
with 5 colored regions. The outer most of these colored patterns are two red circles that serve as 
sentinel markings to delineate the boundaries of an interior pattern of encoded colored rectangles 
which in this case are orange and yellow. Relative to the task of finding an object of interest in an 
RGB image, the red circles serve as markings that tell the vision system that there may be an object 
o interest in their proximity. This is. of course, not a certainty since other objects with a color 

similar or identical to the sentinels may be observed, but such sentinel markings serve as a clue to 
assist in restricting the search neighborhood. 


The use of sentinel markings in the RGB domain was employed as a substitute for other 
methods that could be used if range images were available. To illustrate, consider a scenario in 
which an object is sought with the earth in the background. If range sensing were available, the 
election of an object would be straightforward, assuming that the object were within the 
operational range of the sensor, since there would be no meaningful return signal from any source 
other than the target object. In the absence of a range image, however, it would be necessary to 
use only the color image to distinguish the object from a background that would consist of a 
myriad of colors. This is the fundamental purpose of the sentinel markings, but again, they are not 
sufficient by themselves to guarantee that an object of interest has been detected. 

The mechanism by which a target object is detected is to seek the interior color coded set of 
rectangles that lies between the sentinel markings. For the case of the truss coupler cone shown in 
figures 10a and 10b, there are three such colored markings which can take on any of 4 colors 
(yellow, orange, blue, green). Hence, the identities of up to 24 (= 4 * 3 * 2) different objects 
could be encoded in this fashion, assuming that ambiguous (symmetric) patterns are to be 
avoided. This method of color encoding is analogous to bar coding the object except that no active 
scanner is required. The processing of the color bar code is performed as follows: 



| , Locate alt regions in the image that have the same color as the sentinel markings. Record the 
centroids of these regions as the locations of potential sentine . 

3 ' 2 ,K>,l,er « mdidate 
pair and repTa, the process sorting at step 2. Othenvise. conttnne on to step 4. 

4 If the intervening region color combination is a known configuration, record the object's 
identity, record its identity and location and go back to step 2. 

Assuming .ha, a targe, object's color encoded identifier is visible, the above object 
segmentation and identification algorithm works quite well as long as the object is c ose enou 
, hi camera such that each colored bar projects onto a few hundred pixels in a well focused i . . 

When such is no. the case, the VSP is nevertheless able to locate the targe, object, but must plan 
actions tha, compensate for poor focus or viewing the object a, large distances. In pamcular. there 
am three cases which cause varying degrees of complexity in the plannmg process when Menq ™* 
to locate specific objects. These cases involve (a) an object in close prox.m.ty tha, ,s completely t. 
focus (b) an object in close proximity that is moderately out of focus, and (c) an o ject t a is 
ccanpietely on, of focus. The plans generated by the VSP to find the targe, object for each of these 

scenarios follow. 

3.3 F.xatnole Scenarios 


S ‘ ThTIrus's coupler is close to the MRP and would be in focus if the camera were pointed toward 
it. 

Tommand recei ved hv the VSP: 

Search the forward hemisphere for the truss coupler and any other known objects and 
report the locations of these objects when finished. 

Plan generated by the VSP: 
location. 

2. Report the locations of all recognized objects to the task planner. 
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Scenario 3.3.2 

pointed toward It. '* l ° ^ MRP but would be moderately out of focus if the camera were 
Command rer eived hy the VSP - 

report'thetoca^ns^oft^s^obj%trwheVfinished U,> * er a " d “" y °' her known ob J e « s m>d 

Plan genera ted b % the VSP- 

markings' "when^'rVcoloXr Se tor m/br"' 1 h I" ,isphere for Pain of sentinel 

s^h^ 

will prevent identification of the sou°ht taroefob""! 16 markm S s w,n Mend together and 
were found. In this case, one of two dOTerenfplani w n ' h ° Ush ? e 

1 ot autonomy requested by the task ir J* generated - depending upon the 

intervention, the VSP will request that P u '/ the task P la nner permits opera 
situation with more sophisticated hardware^^courd^H 85641 ' U Sh ° Uld be noted that in a 
could estimate the distance of the taraet nhi ^ ould bedone automatically since the VSP 
camera s focus accordingly. On the ot h er & 7^ 8 range finder and then adjust the 
request the task planner to move the MR a r ^ focus,n g « precluded, the VSP will 
that the lens focus setting haTbeen fixed ^ tOWard the tar get object under the assumption 
will know the locations 1 meter - Since the S e 

maneuver can he executed h y JatiogTh^alotto K&S* ' ^ 

2 ' — -T the truss 

focus, and reports success or failure which shou ' d now be in 

VSP successfully found an initially out of fnriJ g 2 d ustrates a case in which the 
planner to move the MRP toward it. y trUSS cou P ,er after notifying the task 
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Figure 12: Configuration of MRP Relative to Truss Coupler after Executing VSP Plan 


Scenario 3.3.3 

The truss coupler would be totally out of focus if the camera were pointed toward it, or too far 
away to be identified as even a candidate, or is not in the forward hemisphere oi the MK . 


Command received by the VSP: 

Search the forward hemisphere for the truss coupler and any other known objects and 
report the locations of these objects when finished. 


Plan generated by the VSP: 

1 . As in the previous two scenarios, the truss coupler would be sought by searching for its 
sentinel markings and the appropriate intervening color code. 

2. If no candidate sentinels are found, the VSP cannot request the task planner to move the 
MRP to a more advantageous position with any degree of confidence based on observed 
data. Hence, it asks for refocusing of the camera and for the MRP to be pointed in the 
general direction of the target object. 

3 Once step 2 is performed, the actions outlined in Scenario 2 can be followed to achieve the 
desired goal. 


The above scenarios illustrate the VSP's ability to plan and execute actions that compensate for 
an out of focus camera or an object that is at a distance that makes identification difficult. Inherent 
in these actions, however, is the need to be able to estimate the distance to the object so that 
refocusing can occur or the MRP can be moved toward the object. For the current implementation, 
distance estimation was based on knowledge of the focal length of the camera and the distance 
between the sentinel markings on the target objects as embedded in the model knowledge base. 
This method suffers from two significant deficiencies, however. First, in order to estimate 
distance to the object, the camera must be relatively near the plane that is the perpendicular bisector 
of the line joining the sentinel markings. Second, and perhaps more importantly, although two 
markings are sufficient to base an estimate of distance upon, at least four markings are required for 
complete six degree-of-freedom pose estimates. With this in mind, objects like the truss coupler 
were also marked with colored features such as those shown in Figures 1 la and 1 lb and a study of 
the quality of results was undertaken. 


3A Intensity Based Pose Estimation 

The technique for estimating the spatial pose of the truss coupler is based on the algorithm 
described by Hung, Yeh and Harwood. 12 This method requires three prerequisite conditions in 
order for the algorithm to be applicable. First, the effective focal length of the camera must be 
known. Second, the target object must have four coplanar points, no three of which are colinear. 
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Finally, the distances between each pair of the four points must be known and recorded in the 
model. If these prerequisite conditions are met, then the complete six degree-of-freedom spatial 
pose of the object can be determined by observing the locations of the four points in the image 
plane. The method involves only simple vector inner and cross products and the solution of linear 
equations. 

For the specific case of the truss coupler shown in Figures 1 la and I lb, the markings shown 
were placed on non-planar surfaces. However, the four outer markings shown at the comers of 
the twelve marker pattern lie in the same three-dimensional plane and therefore meet the criterion 
that is necessary to apply the Hung-Yeh-Harwood algorithm. 

Nine tests with actual images of the truss coupler were run. These tests were divided into three 
groups which varied the pose of the truss coupler by rotating it about the conic axis, translating it 
along the conic axis, and changing its distance from the camera. From these tests, two conclusions 
can be drawn that directly affect the architecture of the VSP. First, in order to minimize sensitivity 
of the pose estimate to local pixel noise, the target object should fill a large portion of the image 
plane. Second, the camera should be positioned relative to the target points such that its optical 
axis is perpendicular to the plane containing the four points. The latter condition is particularly 
important since on curved objects like the truss coupler, markings on the '‘horizon" of the surface 
produce projected image plane coordinates that are very sensitive to minor variations in their 
extracted positions. 

These observations are relevant to the Vision System Planner because it is not known in 
advance how close the camera should be to the object in order that it should occupy a large region 
of the image plane. However, as shown in the initial study, it is possible to use range images to 
estimate the distance and to use this estimate as the basis to execute a move toward the target object 
to produce a viewpoint that is close enough. The more difficult problem is to achieve a viewpoint 
such that the optical axis is nearly perpendicular to the plane of the four markings since this would 
involve at least an approximate knowledge of the rotational parameters of the object. However, if 
the object possessed multiple quadruplets that could be uniquely identified, then there would be a 
redundancy built into the model by which its pose could be estimated. For cylindrical or conic 
objects, this technique would be particularly appropriate since their poses are uniquely determined 
by the location of a point on the axis and the orientation on the axis itself. Hence, if multiple sets 
of points were evenly distributed around the cylindrical or conic section, it would always be 
possible to determine the pose of the object by selecting an “inner” set (in the image plane) that is 
most likely to satisfy the optical axis perpendicularity constraint. This method of pose estimation 
will be discussed within the context of an expanded VSP architecture later. 
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4.0 Synthetic Range Image Processing 



It has been previously pointed out that range image processing can directly provide 
, ha. is useful for recognition, pose estimation and 

are both advantage* > ^ am object without extensive image 

advantages are ^ stereo despondence in two intensity images). However, h.gher 

range im age onto =ion ^ ob . ects |mo ihe|r componem surfaces „ particularly 

computationally intend since multidimensional decoup, ed Hough transfonns are typ.ca, y 
required 1 ^ I. is therefore appropriate to consider methods of object classifies,, on and pose 
estimation that mahe use of the best elements of each domam considenng the lollowmg pnncples. 

1 ' S’, Ifonly'inSy ^M^ue to which 

they may be subjected. 

2. Certain low level operations in the range domain are relatively inexpensive. Among these are 
the computation of local surface normals. 

object. 

4. Certain objects may be recognized ^ '^Kow^^^ be 

SSconlmin modern, n"=c2 in an appropnately structured recognition search tree. 

With the above principles in mind, a set of range processing primitives was developed that can 
separate obsetved objects into geometric classes as follows: 

' ' can'appearln 

arbitrary orientations are then generated. 

2. the learning phase, these objects - 

different orientations such that °bj e ct oriente i d 1 j phase is to develop a set of 

a a nd ea g fom^ e c t^ m Lslr^ each object when attempts 

are made to recognize it later. 

3. During the recognition phase, the information extracted from the range image is compared 
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that because of the computational expense intdve^noaSm 5 ^ 3 ^ T*' lt should ^oted 
curved surfaces are cylindrical, conic or otherwise ^ence T P i* S ^ t0 determine whether 
or framework for further refinement of the Sties of L*' tJ on, y P™'^ a basis 
range .mage processing or usino featured ! ,! the objects based °n more expensive 

expense of pressing ti range "mages [t is £,? eV e d ZZ^T ° f the ^ 
IS by combining intensity based features wiih i a th f mOSt effectlve manner to proceed 
ased on planar/nonplanar topologies. 6 ® e ® eaned from a rough classification 



Cube 


Cut Cylinder 


Truss Coupler 


Figure 13: Models for Range Data Process.' no 

t? 


4^2 Range Image j^gcessing 


In both the learning and recognition phases it is necessary to seo me n, 

Th' l7y 7JT S T'"*' " Umber ' ^ iUCh ' ^°nsa„d l heLspe; i i V e 
I tie way that this is done is as follows: F 


into planar and 
be determined. 


r^va «•*? V = lx y z| . Then, if thi , 

planar normal vector P = [ a b cj. ^ + by + cz - d _ 0, clearly P • V = d where 
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2 . Now if V,, V , V are vectors to set up the' ’following 

a ^ normal based on ,his neighborhood: 

Let M = I V, V, ... V y | be a 3 row, 9 column matnx of range image values and let 

D = l 1 1 ... 1 1 be a row vector containing 9 1 s. 

Then, if the 9 points all belong to the same plane with equation ax + by + cz - d, 

label M = D fM 

and i, is possible to compute a leas, squares solution for ,a b c| by using a pseudo-, nve« o 
by observing that 

|a bc| M M l = D 

|abc| (MM 1 ) (MM 1 )' 1 = DM t (MM t )' 1 
|abc| = D M l (M M 1 )' 1 

. U i whf*re lx V zl is the value of the central 

The value of d can then be computed as d = ax + by + h Y 

range pixel. 

order to detenaune where there are planar regions within 
are computed across a grid tha, effectively overlays the -nse tmage a^ thos P 

contributed to (nearly, ,den,ical plane equat.ons are coil, <• ed ■" “ ^ 

pianar and non - pianar 

surfaces due to the associated computational cos features relating to the observed object 

r:r,s=^ 

1. the number of visible planar surfaces (s 0) 

2. the number of visible curved surfaces (- 0 or 1) 

3 . the area of the visible planar surface with the smallest area 

4. the area of the visible planar surface with the largest area 

5. the total visible planar area 

6. the total visible surface area for the entire object 
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The next subsection demonstrates hnv*/ tu r 

about the topology and geometry of the object. 656 ^ ^ ^ ^ baS ' S f ° r learnin § 

4.3 Learning model feature 

viewpoint dependent constrZlXl facmtlte rapid 7^ ‘ S -‘° emb0dy ' he model descri ptont 
an object is construed ot used. To "° 8 '° ba ' *•*■*» ° f 

'3. if three planar surfaces are ever viewed^ IZl tl Z T ^ *“ ° b ^ ls 
IS a cube Since the maximum number of visible I eg ° nca y stated that the observed object 
coupler are 2 and 1, respectively! S ^ f 7 ** ^ a " d *»« 

object cannot be the cube and it will be necessary to u ^ ° bserved - the " the 

determine whether it is the cut cylinder or the truss rn 7 7 * ^ d,SCnminatin § feature(s) to 
to have a global descriptor of each object that d K ^ ^ ^ 0006 ° f C3SeS is k ne <*ssary 
IS ’ however - important to judiciously select a 77 ^ t0P ° ,0§y and *»«"etiy. It 

potentially observable features so that it is k e P °' mS that ^^ciently constrains the 
least I and at most 3 visible planar suffices while ^ that a cube vvil1 always have at 

The learning mechanism actually employed J h 7 " ^ ' a " d 2 ’ lively, 

shown in Figures 14-16. ? * ° ** ,,lustrated b y examining the image sequences 

visible^ After" three of its planar faces are 
the normals shown in Figure 14b are derived. The areaslTthe TT ^ ^ W “ h 

is recorded in the initial model for the cube This ^ se planar faces and the visible area 

shown in Figures 14c and 14d that demonstrate th i § ^ f ° ,,OWed by the two examples 
result in fewer than 3 planar faces beina visible 3 Cert3,n V ' eWp ° ,ntS re,ati ve to th e object may 

Fibres 15a and ,5b cause the 

*w„ Figures rs:;;:?;,:!'::: r cc r y ■* v,s,bie ' 

visible and Figure I Je demonslrates that no curved surf 3 “ 1 planar surface m ‘‘> be 

view that presents only the half-cylindrical side r ““ ma >' be visible. In addition, another 
■ha. no planar regions may be vi M' Z £ ^ m0de ' ^ •< is known 

is illns, rated by Figures ^. 16d , 

planar or curved surfaces may be visible. b represents knowledge that one or no 
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Figure 14a: Cube Showing Three Planar Faces 



Figure 14b: Cube Showing Surface Normals for Three Planar Faces 






Figure 14c: Cube Showing Surface Normals for Two Planar Faces 



Figure 14d: Cube Showing Surface Normals for a Single Planar Face 





■8-re 15a: Cu, Cylinder Showing One Carved and Two Planar Surfaces 



i v- Tut Cylinder Showing One Planar and One Curved Surface 



Figure 15d: Cut Cylinder Showing Surface Normals for Visible Planar Surface 



Figure 16a: Truss Coupler Showing One 


Planar Face and Curved Surfaces 



Figure 16b: Truss Coupler Showing Surface 


Normals for Visible Planar Face 


Figure 16c: Truss Coupler Showing Only Curved Surfaces 



Figure 16d: Truss Coupler Showing Absence of Planar Surface Normals 






4.4 Range Image Based Object Recognition 


The learned features described in section 4.2 were used as the basis for discriminating among 
the cube, cut cylinder and truss coupler in various spatial poses like those illustrated by Figures 14- 
16. The system performed as expected, with the most useful features being the numbers of visible 
planar and curved surfaces and their respective areas. For certain views that did not produce a rich 
set of useful features, the system was unable to determine the identity of the viewed object. For 
example, the semi-circular planar region of the cut cylinder and the circular region of the truss 
coupler are very nearly the same area. Hence, if only these surfaces are visible, the system as it 
currently exists cannot distinguish between them and additional differentiators are necessary. 

One possibility for such differentiating features would be to examine not only the topology of 
each surface but the structure of its bounding edges. Since the cut cylinder face is bounded by 
edges that are linear and semi-circular and the truss coupler surface is bounded by circular edges, 
these would be sufficient additional features to distinguish between these objects if such limited 
viewpoint dependent information were available. For certain cases, however, it will be necessary 
to obtain additional views, perhaps using a different sensor, in order to recognize an object or to 
estimate its pose. The augmentation of the current system with additional features and viewpoints 
and the combining of intentisy and range domains are the subject of the next section. 
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5.0 Vision System Planner Recommendations 


As the result of experiments undertaken with both actual and synthetic sensor data, the 
following general principles were developed and used as the basis for recommending modifications 
to the Vision System Planner architecture. Figures 17-20 illustrate the basic recommended flows 
for sensor/algorithm decisions relating to object detection, recognition, pose estimation and 
iteratively improving the confidences of recognition and pose estimation. 


1 . For the detection of objects without recognition and/or pose estimation, the most appropriate 
sensor to select under general circumstances is the laser scanner. The primary reason for this 
^ 0 " clu , s, °" ,s that segmentation of objects as viewed by a color camera becomes extremelv 
difficult if a colorful background (e.g. the earth) is present. Hence, in the most general cases, 
i is preferable to attempt to detect anomalies in depth data rather than in color images. 

C 1 * 


2 ' For t^ e recognition of objects a two pronged approach that combines range and intensity 
images is advantageous. From a geometric structural approach, the recognition of Generalized 
curved objects from range images is too computationally expensive, even to perform surface 
segmentation. However, as was shown in the previous section, a limited amount of ran°e 
processing to extract the planar surfaces and their areas can provide the basis to group objects 
into broad categories. Once this is done, key intensity features can be examined either in the 
re ectance image for the laser scanner or in that of the color camera to refine the identity of the 
object. For example, suppose that a planar and a non-planar surface are extracted from a ran°e 
image This is a situation that could arise if either the cut cylinder and the truss coupler were 
viewed as in Figures 15d and 16b. The resulting confusion between the two models could be 
resolved simply in the intensity image by noting that the planar surface of the truss coupler is 
not bounded by any straight line segments, whereas the planar surface of the cut cylinder has a 
semicircular and a linear boundary. In another case, colored (bar code) patterns could be used 
as the discrimmatmg factor. Hence, applying information obtained from both range and 
intensity domains would differentiate the two models. 


3. 


4 . 


The manner by which spatial pose is estimated should be a function of the visible surface 
C ara ^ t ® nstlcs tor ,he observed object. If only planar surfaces are observed, then the vertices at 
which these planar surfaces intersect can provide sufficient features upon which to compute 
pose using the locations of these features as extracted from the range image. For curved 
objects such as the truss coupler, however, computationally difficult problems relating to the 
extraction of surface type (e.g. cylinder, cone, etc.) arise. It is therefore advisable to consider 
using markings on the objects that facilitate applying one or more of the computationally simple 
intensity domain pose estimation algorithms such as that of Hung, Yeh and Harwood An 
object like the truss coupler could be marked redundantly such that at least four copianar 
noncolmear feature points would always be visible. This approach avoids problems associated 
ith the computation of parameters for curved surfaces and potential occlusion of a sino| e set 
(oi 4) pose estimation feature points. ° 

Finally there may be cases for which there is a low confidence for the identity or estimated 
pose of an object as determined above. This may be due to a goal that minimizes processino in 
£ " a i f °i d computationally expensive feature extraction during the recognition phase. 
However, if a reliable pose has been estimated, the location of features in each image may be 
predicted and the search space can be considerably constrained. Hence, it is appropriate to 
loop back through both the recognition and pose estimation phases further refining estimates of 
teature correspondences and pose parameters based on predicted and verified features. 
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Figure 17: Object Detection 





Start 


process range image, 
extract planar, non-planar 
surface characteristics, 
provide initial discrimination 


only planar 
surfaces visible 


one or more 
curved surfaces visible 



Figure 1 8 : Object Recognition 






Figure 19: Spatial Pose Estimation 







refinement of 
recognition/pose 
confidence based 
on model based 
predictions in 
intensity and/or 
range images 


Figure 20: Object Detection, Recognition, and Pose Estimation 



The combination c f t. e : bo/e >la.in : ng paradigms as illustrated by Figure 20 is intended to be 
an iterative process by wb>cb a ‘oi fi-ience measure is computed based on the currently estimated 
pose and identity of an object, i he confidence measure should generally be based upon the 
computed error <e.g. F^MS) between observed feature locations in the image and their locations as 
predicted by the current poso/idenfty hypothesis, and it should be iteratively refined as additional 
features are sought in new views. In essence, this is analogous to what happens when the Vision 
System Planner seeks an ou r -of-foeus truss coupler in scenario 3.3.2. However, the fact that the 
VSP continues its search for the truss coupler is currently motivated only by the fact that color bar 
combinations are sought. There is a clear need to use features from range images (e.g. 
planar/nonplanar surfaces, surface area, etc.) to prune the recognition tree and to couple pose 
estimation algorithms to predict image features that could be used to modify confidences in the 
iterative process. 

For example, when the Vision System Planner was seeking the truss coupler as described for 
the various scenarios in section 3, color images were the only available sensory input. Two of the 
three objects in front of the Mobile Robot Platform in Figure 12 had only planar surfaces visible 
and one showed a curved surface. Of course, surface curvature is not easily determined from 
intensity images. However, as was demonstrated in section 4, if a laser range finder had been 
available the cubes would have been immediately rejected based on their surface characteristics. 
Had this capability existed, a tentative identification of the object as a truss coupler could have been 
made. Based on the tentative identification from range image features, the model knowledge base 
would have revealed that its spatial pose could be estimated based on four coplanar non-colinear 
(i.e. Hung-Yeh-Harwood) feature points in the intensity image. The estimated spatial pose in 
tandem with knowledge of the sensor models would have made it possible to backproject the object 
model into the intensity and7or range images to predict other features or to modify the confidence of 
the recognition/pose combination. Hence, combining information from both sensory domains 

would provide a capability that is greater than the sum of the capabilities strictly obtainable from 
intensity or range sensory domains and algorithms. 
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6.0 fuiimar/ ; nd Conclusions 


In order to increase the autonomy of the extravehicular Activity Helper/Retriever, it is 
necessary for the Vision System Planne. to be able to select an image sensor or invoke an image 
processing algorithm that will achieve ? goal : n an expeditious manner. The primary criteria for 
selecting the sensor or algorithm should be based upon 

a. what is known about the object being sought (the object model), 

b. what is known about the operational environment (the world model), 

c. what is known about the capabilities of the sensor, and 

d. what is known about the capabilities of the processing algorithm. 

The results obtained using actual images from a color camera mounted on the Mobile Robot 
Platform and synthetically generated range images demonstrate that each sensory domain has 
inherent strengths that should be exploited and inherent weaknesses that should be avoided when 
circumstances warrant. More important, however, is the capability of each sensory domain to 
complement or enhance the capability of the other, particularly if an approach to iteratively refine 
the confidences associated with identification and pose is taken. 
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Abstract 

The Extravehicular Activity Retriever (EVAR) is a robotic device currently being 
developed by the Automation and Robotics Division at the NASA Johnson Space Center to 
support activities in the neighborhood of the Space Shuttle or Space Station Freedom. As the 
name implies, the Retriever's primary function will be to provide the capability to retrieve 
tools and equipment or other objects which have become detached from the spacecraft, but it 
will also be able to rescue a crew member who may have become inadvertently de-tethered. 
Later goals will include cooperative operations between a crew member and the Retriever 
such as fetching a tool that is required for servicing or maintenance operations. 

This paper documents a preliminary design for a Vision System Planner (VSP) for the 
EVAR that is capable of achieving visual objectives provided to it by a high level task 
planner. Typical commands which the task planner might issue to the VSP relate to object 
recognition, object location determination, and obstacle detection. Upon receiving a 
command from the task planner, the VSP then plans a sequence of actions to achieve the 
specified objective using a model-based reasoning approach. This sequence may involve 
choosing an appropriate sensor, selecting an algorithm to process the data, reorienting the 
sensor, adjusting the effective resolution of the image using lens zooming capability, and/or 
requesting the task planner to reposition the EVAR to obtain a different view of the object. 

An initial version of the Vision System Planner which realizes the above capabilities 
using simulated images has been implemented and tested. The remaining sections describe 
the architecture and capabilities of the VSP and its relationship to the high level task planner. 
In addition, typical plans that are generated to achieve visual goals for various scenarios are 
discussed. Specific topics to be addressed will include object search strategies, repositioning 
of the EVAR to improve the quality of information obtained from the sensors, and 
complementary usage of the sensors and redundant capabilities. 

1. Introduction 

There has been considerable research that relates to the development of specialized robotic 
devices that are designed to operate in exterrestrial domains. These devices cover a broad 
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Figure I: Planning System Architecture 



Figure 2: Vision System Components 























communicating the results or a request for assistance to the Task Planner. The remaining 
sections discuss a suggested architecture for such a Vision System within the context of the 
Extravehicular Activity Retriever. 


2. Vision System Planner Design Considerations 

The planning mechanisms developed are founded on the assumption that there should be 
at least two visual sensors which provide intensity (color) and range images. There are 
several reasons why such a multisensory approach is desirable, three of which are 
particularly significant. First, the availability of sensors with complementary capabilities 
permits the VSP to select a sensor/algorithm combination that is most appropriate for 

U wen 8 the ,^ urrent visual S oal as specified by the task planner. Second, if the sensor that 
the VSP would normally select as its first choice to achieve the goal is either unavailable or 
inappropriate tor usage because of some current constraint, it may be possible to perform the 
desired task using the other sensor to achieve the same goal, albeit perhaps by acceptino a 
penalty in performance. Finally, instances may occur for which it is desirable to verify 
results from two different sensory sources. 

The first of the above motivations addresses achieving the visual goal in the most 
effective manner by allowing the VSP to choose among sensors with complementary 
capabilities. For example, if it is desired to distinguish between two objects of similar 
structure with the color of the objects being the primary differentiating feature, then it is 
apparent that the color camera should be used as the primary sensor. On the other hand if 
the size and/or geometry of the objects are most useful for determining identity, then it is 
important to be able to expeditiously extract and process three-dimensional coordinates. 
Clearly, this is a task that would be most properly assigned to the laser scanner. Similarly, 
tasks involving pose estimation , object tracking 8 and motion estimation 9 would more 
appropriately involve invoking the laser scanner as the primary sensor. The initial versions 
of these submodules have already been developed and will be tested in a reduced gravity 
environment using NASA s KC-135 aircraft during the coming year*^. 

The previous example involving the need for three-dimensional coordinates is illustrative 
of a case in which the primary sensor (the laser scanner) is engaged to extract the required 
information. However, there may be cases for which the laser scanner cannot be used to 
obtain range information because (a) the object to be processed is covered with a highly 
specularly reflective material thus preventing acquisition of good return signals, (b) the laser 
scanner is currently assigned to another task, or (c) the laser scanner is temporarily not 
functioning properly. For such instances, it is highly desirable to provide a redundant 
capability by using the other sensor if possible. The classical method for determinin° 
three-dimensional coordinates from intensity images involves a dual (stereo vision) camera 
setup in which feature correspondences are established and the stereo equations are solved 
for each pair of feature points. Although the current simulated configuration has only one 
intensity image camera, this alternative mechanism for computing range values is in fact 
possible for the VSP to achieve by requesting the task planner to reposition the EVAR such 
that the cameras initial and final positions are offset by a known baseline distance. Of 
course there is a penalty in performance if the (pseudo) stereo vision method is chosen, 
since the EVAR must be moved and feature correspondences computed. However, it is 
nevertheless important to have such a redundant sensing capability for the reasons previously 
mentioned and to be able to independently verify the results obtained from one sensor or to 
increase the contidence of those results. 

Aside from selecting an appropriate sensor, it is may also be possible to alter certain 
physical characteristics of the sensor such as the effective resolution and scanning rate In 
the case of the laser scanner, images can be acquired at rates (and resolutions) varyino 
between 2.5 frames per second (256 x 256 pixels) to 10 frames per second (64 x 256 pixels? 

I he capability to select a faster frame rate with a penalty in resolution becomes significant if it 
is important to be able to sense and process data rapidly, as in the case of motion estimation 
Un the other hand, if an object is relatively stationary and finer features are to be sensed then 
higher resolution with a lower frame rate would be chosen. Hence, a vision system planner 


22S! J. be ab ^ e se * ect a sensor as well as its relevant parameters (e.g. scannin® rate 
resolution, zoom factor, orientation). ° aie ’ 

Once an appropriate sensor has been selected and configured, the next step is to focus 
ttenfon on the object(s) and to apply a preprocessing algorithm that will effectively achieve 
dt.Tf Focussin § ^ ^ntion is important because it reduces the amount of ima^e 

data that must be processed for the immediate task. If the task is tracking an imaoe blob that 
corresponds to an object of interest, and the image blob merges with another blob or 
disappears due to occlusion, then the object's predicted location (computed by the adaptive 

The i traCk * er) ‘ S Centra ‘ t0 assistin § in segmentation of sub-blobs « 
nrn Th h'T ° 3 P ° SC estimatlon al g°mhm is directly dependent on the model beino 

nImelvobiecThhaser T fundarnen * al classes of algorithms that are currently employed! 
mely object-based and image-based (multi-view) pose estimation. If an object contains 

curved surfaces (e.g a cylinder) then an image-based approach is taken by which the 
occluding contours derived from several views of the object that were recorded on a 
tesselated sphere are used as the basis for matching the observed object’s outline If the 
o ject has a polyhedral structure (no curved surfaces) then an object-based pose estimation 

femes' in'aCAD 1 S h* featUreS ext . racted tVom ima § es are matched against model 
teatures in a CAD data base. For situations in which the object is very close to the sensor 

t h e"e n d nTobfec f ? Si mi lari v* for ^ ** eS ! mated on sub P arts of the entire object rather than 
„ nH ob J.^ t - Similarly, tor purposes ot recognition, the subset of object features selected 
and the algorithm chosen are also a function of the size of objects in images 

Proximity to target objects will affect not only the features selected" for recoonition and 

fnZTp bUt V * Str ° ng, y ,nfluence the confidences associated with "he results 
computed, hor example, a typical scenario might involve a case in which the EVAR is close 

“ ‘ arget ° bjeC ‘ '° l W° ,hesi “ ‘lass based on color bnl loo far 
efimtively recognize its geometnc structure using laser scanner data In this case the VSP 

dos U e? X‘ob e ' y : de “ fy , ,h f objm (usin ? «*“> *" d advise the tLk planZ to m ove 

mnf a °r \\ ? C ?°. ! ^ f aser scanner image with higher resolution can be obtained The 
confidence of the mitral hypothesis would then be strengthened (or perhaps weakened) 

cSd!,v g 0 "l , C ?" C|US T reached ^ Processing the ranle data at close proZ'ty This 

capab tty is illustrative of the necessity for the VSP to be able lo plan high level vision tasks 

reposition 3 the EVAR 'HenlZt'rh'"'^'^ 3 Ce ’i ““I 1 1 J lc hl " her level lask planner in order to 
... r Hence, highest level of vision system planning the VSP will he 

responsible for task scheduling and resource planning. 

to dJrp!f fU rP^Tn' ntal f arch j tect ! re for the Vision System includes modules which are desired 
to detect, recognize, track, and estimate the pose of objects. Upon receive a request from 

the mam task planner to achieve one of these objectives, the Vision ^SystT Planner 
determines an appropriate sequence of goals and subgoals that when executed will 
accomplish the objective. The plan generated by the VSP will generally involve (a^choosTno 
an appropriate sensor, (b) selecting an efficient and effective algorithmic process the imaoe 

S’ f comrnunicatin g the nominal (expected) results to the task plannerir informing the 
task planner of anomalous (unexpected) conditions or results, and (d) advisino the § task 
p nner of actions that would assist the vision system in achievino its objectives The 

mSekT a" g JI? erated by the ^ SP wiI1 Primarily depend on knowledge relatino to the sensor 
models (e.g. effective range of operation, image acquisition rate) "he object models (e o 

obTe'Z and the WOr |. d mod V %'S- expected distance S aid Side o( 

different scenarios Presents the resulting plans generated by the VSP for several 

3. Scenarios and Results 

^whereitfir*^^ 



?ew a (fe P ' z a oom factoffn", ? h ° nSul ', ation wi,h the ,. h ™ a " °P<™or> selects an angular Held of 
are searched in that order ORU n °‘ fo,Ind - lhe ex,re ™ sectors (7-14) 



Figure 3: hemispherical sector search order 
i mcltfon 0 n ” XS* _ SI f uati ons involving object detection, recognition. 


range estimation, and obstacle notification. 
3.1 Scenario 1 


Co mmand received hy the VSP: Search in front of the EVAR for an ORU. 

Plan generated hv thp VSP- 

and large in the image (Ron™ V TthToRn" ,he “'T* iS Cen,ered 

■he desirability inTerms of™^Sv« g . a nfSf ,0n “ ^ made 
illumination source) Finally the KP , " d P 0 ^^ consumption by the 
the EVAR by ISOdi^^^E^* ^ “ r °' a ‘ e 



Teleoperator command: RGB search for ORU 

Field of view angle = 50° 
Scan angle = 45° 


Figure 4a: search of sector 1 for ORU 



VSP response: ORU was found in sector 2 

Area of object was 150 pixels 


Figure 4b: search of sector 2 for ORU 



VSP action: Reorienting camera gimbals 

Setting field of view to 7° 

figure 4c: first gimbal and zoom refinement 



Setting field of view to 4° 


figure 4d: second gimbal and zoom refinement 





Teleoperator command: estimate range to ORU using laser scanner 

VSP response: estiamted range to ORU is 18.5 feet 

Figure 5: laser scanner range estimation 



Teleoperator command: estimate range to ORU using pseudo-stereo 

V SP response: estimated range to ORU is 18.5 feet 

figure 6: pseudo-range estimation 



Teleoperator input: move EVAR along optical axis 


figure 7: moving EVAR toward the ORU 



Teleoperator input: check for obstacles in field of view 

VSP action: obstacle located (identified by cursor) 

figure 8: checking for obstacles prior to moving EVAR 


3.2 Scenario 2 


Co mmand received hy , he VSP: Define ihe dis.a„ce .o .he ORU. no sensor specified. 
Plan generated hy the VSP- 

1 . Locate the ORU as in Scenario 1 using the color camera 
" delerrarne which sensor is ,he mos, 

laser scanner is chosen. ° RU S n ° l s P ecular| y reflective, the 

3 ' ^g"nV?o a «h P c ORU h i e „ ' hTJor "" ' ma " e ' ha ‘ 10 region 

image elements (Figure 5). 'mage an compute the distance to those range 

3.3 Scenario 3 

Command received hv the VSP- 

c D a“™ia^ h |^trSion' he ° RU f “« «*“*» <™«ance nsing singie 

Plan generated hv \/<;p. 

2. MoKrt^EVARTefL^"^ 0 ^ " Si " g ,he “'or camera. 

the ORU in that image. Th^moverte EVA^ ‘"h? 8 ^ 3 ' 11 ’ reCOTd ,he location of 
image, and record the location of the ORU m te 

3 - 


3 .4 Scenario 4 


.Command received by the VSP- 

of the color camemltiUheEVARls a 
Plan generated hy the VSP; 

l • Locate the ORU as in Scenario 1 using the color camera 
2. Estimate the distance to the ort i \ , 

3 rnmnutAo . > ORU (D oru ) tising the laser scanner. 

position, maintaining the same attitude (Figure 7). 

3.5 Scenario 5 

Command re ceived hy the VSP- 

any other objects in tbefield of v^ew^recl^ determine whether 

moving toward it. re c oser to th e EVAR than the ORU prior to 

Plan generated hy \/gp. 

I r ‘ . 7TT 


1 

2 . 

3. 


Locate the ORU as in Scenario 1 using the color camera 
Estimate the distance to the ORU using the lase? scanner 

OR* U an d^eporT a poten t i al "SSSS? °7t °\ ** ^ ****** the 

otS.‘ he EVAR " i ,he ORU The <£* i" SJS 8lhowstTS,.ial 



2. 


3 . 


4 . 


5 . 


6 . 


7 . 


8 . 


9 . 


10 . 


4. Conclusions and Future Work 

simulated inpuf from °I col 0^0^^ an^ lale^scallne^ Th and tested usin § 

capable of planning the fundament tasks required for rhl^ VSP haS been show " to be 
detection, recognition, range estimation, obstacle detection anH syste J T1 ’ name| y object 
repositioning of the EVAR. The next nhase of ( ~ etect,on f nd advising the task planner for 
VSP on robotic hardware that is ^ Wl11 Evolve implementing the 

its environment in a laboratory setting Bevond thit ^ f n * or(s) and moving about in 
env.^nments to physic domains, the pnmary ^oal wU bfto m ° Vl " 8 fr ° m simulatcd 
VSP such that less teleoperator input is requirS 6 increase the autonomy of the 
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